The causes and consequences of Alzheimer’s disease: phenome-wide evidence from Mendelian randomization

Alzheimer’s disease (AD) has no proven causal and modifiable risk factors, or effective interventions. We report a phenome-wide association study (PheWAS) of genetic liability for AD in 334,968 participants of the UK Biobank study, stratified by age. We also examined the effects of AD genetic liability on previously implicated risk factors. We replicated these analyses in the HUNT study. PheWAS hits and previously implicated risk factors were followed up in a Mendelian randomization (MR) framework to identify the causal effect of each risk factor on AD risk. A higher genetic liability for AD was associated with medical history and cognitive, lifestyle, physical and blood-based measures as early as 39 years of age. These effects were largely driven by the APOE gene. The follow-up MR analyses were primarily null, implying that most of these associations are likely to be a consequence of prodromal disease or selection bias, rather than the risk factor causing the disease.

Alzheimer's disease (AD) has no proven causal and modifiable risk factors, or effective interventions. We report a phenome-wide association study (Phe-WAS) of genetic liability for AD in 334,968 participants of the UK Biobank study, stratified by age. We also examined the effects of AD genetic liability on previously implicated risk factors. We replicated these analyses in the HUNT study. PheWAS hits and previously implicated risk factors were followed up in a Mendelian randomization (MR) framework to identify the causal effect of each risk factor on AD risk. A higher genetic liability for AD was associated with medical history and cognitive, lifestyle, physical and blood-based measures as early as 39 years of age. These effects were largely driven by the APOE gene. The follow-up MR analyses were primarily null, implying that most of these associations are likely to be a consequence of prodromal disease or selection bias, rather than the risk factor causing the disease.
Late-onset Alzheimer's disease is an irreversible neurodegenerative disorder which accounts for the majority of dementia cases 1 . Despite major private and public investments in research, there are no effective treatments for preventing the disease 2 . Many risk factors and biomarkers have been identified to be associated with the risk of Alzheimer's disease 3 . Most treatments (99.6%) developed to halt Alzheimer's disease failed in phase I, II, or III trials 4 . One explanation for these failures is that the identified risk factors and drug targets are a consequence of Alzheimer's disease rather than underlying causes.
Genetic epidemiologic methods, such as Mendelian randomization (MR), can potentially provide more reliable insights into the causal mechanisms underlying the associations between risk factors and disease 5 . To date, hypothesis-driven MR studies have found mixed evidence for a causal role of cardiovascular risk factors in the development of Alzheimer's disease [6][7][8] .
Phenome-wide association studies (PheWAS) are a hypothesisfree method, similar to genome-wide association studies (GWAS), which estimate the associations between a genotype or polygenic risk score (PRS) and the phenome 9 . PheWAS can potentially elucidate the phenotypic consequences of Alzheimer's disease, and critically when in the life course, these effects emerge.
Here, we estimated the associations of genetic liability for Alzheimer's disease and the phenome by age to identify the earliest effects of the disease process (i.e., genetic liability to Alzheimer's disease as a risk factor). We then tested whether the identified variables were a cause or a consequence of Alzheimer's disease using two-sample MR (i.e., phenome associated with Alzheimer's disease genetic liability as the exposure).
Association of Alzheimer's disease polygenic risk score and the phenome We examined the effects of genetic liability to Alzheimer's disease on the UK Biobank phenome, using 18 single-nucleotide polymorphisms (SNPs), robustly and independently associated with Alzheimer's disease (P ≤ 5 × 10 −8 ) (Fig. 1A). Phenome-wide association analyses were performed within each age tertile of UK Biobank.  3-5. Results for continuous outcomes are in terms of a 1 SD change of inverse rank normal transformed outcome and log-odds or odds for binary or categorical outcomes. A higher PRS for Alzheimer's disease was associated with own diagnosis and family history of dementia, diagnoses of cardiovascular diseases, and self-reported history of high cholesterol and pure hypercholesterolemia. Furthermore, participants with a higher PRS had an increased risk of using cholesterol-lowering drugs, in addition to beta-blockers and aspirin in all age tertiles ( Supplementary Fig. 3). A higher PRS was associated with lower body mass index and various body fat measures, lower diastolic blood pressure and higher spherical power in the oldest participants (i.e., the strength of lens needed to correct focus) (Fig. 2). In addition, on average, participants with higher PRS performed worse and took longer to complete cognitive tests in all age tertiles (39-72 years) (Fig. 3). Participants with a higher PRS also had a higher weighted-mean mode of anisotropy in the left inferior fronto-occipital fasciculus for participants aged 53-72 years (Fig. 3). There was evidence of an association between higher PRS and blood cell composition markers and these associations increased with age (Fig. 4). On average, the parents of participants with a higher PRS for Alzheimer's disease died at a younger age ( Supplementary Fig. 4). There was strong evidence that a higher PRS was associated with healthier dietary choices (Supplementary Fig. 4) and lifestyles (e.g., frequent exercise) in the two oldest tertiles (ages 53-72 years) ( Supplementary Fig. 5). For previously implicated factors in Alzheimer's disease, a higher PRS was associated with higher systolic blood pressure only in participants aged 39-53 years and higher pulse pressure for all age ranges. There was some evidence of an association between the PRS and a lower number of pack-years of smoking for the oldest participants (Supplementary Fig. 5).
Sensitivity analysis. We replicated the top PheWAS hits from the oldest age tertile (i.e., where most associations were observed) in the Nord-Trøndelag Health Study (HUNT). In participants aged 62-72 years, of the 165 variables identified in the UK Biobank PheWAS, we replicated 32 variables with adequate precision for the age-stratified analysis, 20 of which were directionally consistent and replicated at P ≤ 0.05. The effects of genetic liability to Alzheimer's disease on blood-based biomarkers and physical measures in HUNT closely mirror those in UK Biobank (Figs. 5 and 6). Other replicated effects included a lower odds of the participant's mother having diabetes, dietary habits such as a higher oily fish intake and fresh fruit intake, and lifestyle habits such as frequent sleeplessness/insomnia (Supplementary Figs. [7][8]10). Supplementary Figs. 6-10 show forest plots for all measures. We repeated the analysis, estimating the associations of the PRS and the 21,849 variables in UK Biobank for the entire sample (there are additional phenotypes due to the higher occurrence of events in all participants). We identified the effects of Alzheimer's disease genetic liability on the variables detected in the age-stratified analysis with larger precision and additional variables . For example, in the analysis using the entire sample, a higher PRS was associated with metabolic dysfunction phenotypes such as diabetes and obesity ( Supplementary Fig. 11); a higher volume of gray matter in the right and left intracalcarine and supracalcarine cortices (Supplementary Fig. 13); and additional blood-based biomarkers (Supplementary Fig. 13).
When we repeated the analysis by removing SNPs tagging the APOE region from the PRS using the whole sample, we could not replicate most of the hits detected in the oldest tertile. The non-APOE PRS was associated with higher odds of own and family diagnosis of Alzheimer's disease ( Supplementary Fig. 15) and lower odds of family history of chronic bronchitis/emphysema ( Supplementary Fig. 16). There was evidence that the non-APOE PRS was associated with worse performance in cognitive tests ( Supplementary Fig. 17).

Two-sample Mendelian randomization of UK Biobank phenotypes on Alzheimer's disease
We performed a two-sample MR of the identified PheWAS variables on the risk of Alzheimer's disease (i.e., the reverse model to PheWAS). The strongest associations following correction for multiple testing are shown in Fig. 7. We found evidence that a one SD higher genetically predicted whole body fat-free and water mass decreased the risk of Alzheimer's disease (OR: 0.78; 95% CI: 0.69, 0.88 and OR: 0.81; 95% CI: 0.71, 0.91, respectively). A one SD higher genetically predicted forced vital capacity resulted in 22% lower odds of risk for Alzheimer's disease (OR: 0.78; 95% CI: 0.67, 0.91) (Supplementary Table 12). Furthermore, a higher genetically predicted basal metabolic rate was protective for Alzheimer's disease risk (OR: 0.75; 95% CI: 0.66, 0.85). We observed that a higher genetic liability of doing more moderate physical activity (at least 10 minutes) but not self-reported vigorous activity increased the odds of developing Alzheimer's disease (OR: 2.69; 95% CI:1.39, 5.17 and OR: 1.02; 95% CI: 0.33, 3.18, respectively) (Supplementary Table 16). For previously implicated risk factors for Alzheimer's disease, we found a higher genetic liability for having a college degree and Alevel qualifications reduced the risk of Alzheimer's disease (Supplementary Table 17).
Assessing pleiotropy. To evaluate the validity of the MR assumptions, we quantified pleiotropy using heterogeneity statistics and pleiotropyrobust methods (e.g., MR Egger regression). Due to the large sample size of the exposures, the instrument strength of all SNPs was relatively high (F > 30). However, the SNPs used for each body measurement implied highly heterogeneous effects on the risk of Alzheimer's disease (all heterogeneity Q statistic P < 1.99 × 10 −5 ). This heterogeneity may be due to horizontal pleiotropy . The causal effect estimates for forced vital capacity and basal metabolic rate were also heterogenous (Supplementary Table 12  Assessing causal direction. We used Steiger filtering to evaluate the direction of causation between Alzheimer's disease and the phenotypes identified in the PheWAS 10 . We found little evidence that SNPs explained more variance in the outcome than the exposure for most results reported in the two-sample MR section (Supplementary Data 7-18).

Discussion
In our study, we conducted a hypothesis-free phenome-wide scan to investigate how and at what age the Alzheimer's disease PRS affects the phenome. The effects of a higher genetic liability for Alzheimer's disease are stronger in participants of aged 62-72 years, although the direction of effect is similar across age groups. We investigated whether the effects observed are likely to be causes or consequences of the disease process using two-sample bidirectional MR. We found evidence that a minority of the variables identified in the PheWAS are likely to causally affect the liability to Alzheimer's disease. Of the variables associated with Alzheimer's disease genetic liability in the PheWAS, these included basal metabolic rate, forced vital capacity, whole body fat-free and whole body water mass, and self-reported moderate physical activity. Of the factors previously implicated in Alzheimer's disease risk, these included A level/AS qualifications and college degree qualifications (Fig. 7).
The PheWAS suggested that increased genetic liability for Alzheimer's disease affected a diverse array of phenotypes such as medical history, brain-related phenotypes and physical, lifestyle and bloodbased measures. However, these effects appear to be primarily driven by variation in the APOE gene. Our sensitivity analysis, excluding the APOE region, replicated only the effects for family history of Alzheimer's disease and some cognitive measures. These findings are in line with a recent study which showed that a higher PRS excluding APOE was particularly deleterious for the age of AD onset in APOEε4 carriers, with no evidence of such an effect in APOEε4 non-carriers 11 . Furthermore, these results are consistent with observational studies [12][13][14][15][16][17] and studies in APOE-deficient mice [18][19][20][21] , which demonstrate the multifunctional role of APOE on longevity-related phenotypes such as changes in lipoprotein profiles 18,22,23 , neurological disorders 24 , type II diabete 19 , altered immune response 20 , and increased markers of oxidative stress 21 . Therefore, this strongly suggests that the effects of A. Phenome-wide associaƟon study

PRS for Alzheimer's disease
Trait (e.g. body mass index) X Alzheimer's disease

Confounders
Fig. 1 | Study design for the phenome-wide association study of Alzheimer's disease genetic liability and follow-up Mendelian randomization of identified phenotypes on Alzheimer's disease. Diagram (A) describes our study design when conducting a phenome-wide association study, and diagram (B) describes our study design when using Mendelian randomization. In A, the polygenic risk score for Alzheimer's disease may either have a downstream causal effect on the trait (e.g., body mass index), or it may affect the trait through pathways other than through Alzheimer's disease (i.e., pleiotropic effects). Diagram (B) describes our follow-up analysis using Mendelian randomization to establish the causality and directionality of the observed associations. In B, we test the hypothesis that the trait (e.g., body mass index) causally affects liability to Alzheimer's disease, provided that the conditions (i), (ii), and (iii) are satisfied. The polygenic risk score for the trait of interest is a valid instrument in that (i) the single nucleotide polymorphisms for a trait are strongly associated with the trait they proxy (relevance), (ii) there are no confounders of the single nucleotide-outcome relationship (independence), and (iii) the single nucleotide polymorphisms only affect the outcome via their effects on the trait of interest (exclusion restriction).
Alzheimer's PRS on the phenome (e.g., atherosclerotic heart disease) are likely to be due to biological pathways related to APOE. Previous observational studies have reported conflicting evidence on the association of cardiovascular risk factors with Alzheimer's disease, including hypertension. The results depend on the age at which these risk factors were measured [25][26][27] . Similarly, in our study, a higher PRS for Alzheimer's disease was associated with lower body mass index and body fat in participants of ages 53-72 years and lower diastolic blood pressure in the participants of ages 62-72 years. In agreement with some previous MR studies 6,7,28,29 , we found little evidence that body mass index and blood pressure causally affect the risk of developing Alzheimer's disease. Hence, the association observed in the PheWAS between the PRS, lower body fat measures, and diastolic blood pressure is likely to reflect the prodromal disease process. We found evidence that a higher self-reported number of days of moderate physical activity increased the odds of Alzheimer's disease. A study 30 found directionally consistent effects with the results of our analysis, but the evidence that moderate to vigorous physical activity affected the risk of Alzheimer's disease was weak. We found the genetic liability for Alzheimer's disease associated with several variables involving inflammatory pathways such as selfreported wheeze/whistling, monocyte count, and blood-based measures. This agrees with previous evidence of genetic correlations between Alzheimer's disease and asthma 31 and longitudinal studies 32,33 . In our study, red blood cell indices show the earliest evidence of association with the genetic liability of Alzheimer's disease. Still, we found little evidence that these measures caused Alzheimer's disease using MR, indicating that cell composition changes may be an early consequence of Alzheimer's disease pathophysiology. Previous studies found genetic variants associated with red blood cell distribution width are linked to autoimmune disease and Alzheimer's disease 34,35 . Once the function of the specific variants involved in specific biological pathways related to Alzheimer's disease is further elucidated, pathway-specific PRS may be used to inform the causality of the biological phenotypes identified in our PheWAS, such as blood-based biomarkers. We found evidence that a PRS for Alzheimer's disease was associated with a lower fluid intelligence score, as previously reported 36  Our study shows that the most likely reason for the contrast between the primarily null MR findings in our study, and the vast evidence base for Alzheimer's disease risk factors in observational studies, is that none of the factors identified to be associated with Alzheimer's disease in such studies are likely to be causal. Instead, they are either symptoms of prodromal Alzheimer's disease (i.e., biased by reverse causation) or spurious associations due to confounding and/or selection bias. This conclusion is corroborated by the findings observed in the PheWAS, which are similar to those in observational studies (such as body mass index and sleep disturbances). In addition, our study's lack of positive findings was not due to insufficient statistical power, as the effects were precisely estimated. Unlike PRS and association studies, MR aims to disentangle causality from correlation. Observational studies are prone to biases such as confounding and reverse causation. On the   contrary, as MR uses genetic variants as proxies for the tested exposures, the possibility of confounding and measurement error is negligible 5 . Hence, our study highlights that most of the risk factors identified in observational literature are likely a response to Alzheimer's disease pathological processes. The large sample of UK Biobank provided unparalleled statistical power to investigate the phenotypic manifestation of a higher genetic liability for Alzheimer's disease by age group. Furthermore, the systematic approach of searching for effects using PheWAS reduces the bias associated with hypothesis-driven investigations.
The Alzheimer's disease PRS may have horizontal pleiotropic effects on different traits and disorders, which may result in heterogeneity. MR and instrumental variable estimators for binary outcomes also cannot assume a constant treatment effect; as such it is expected that the MR estimates if sufficiently powered, would be heterogeneous 40  mass and physical activity, as individuals with a higher body mass index or infrequent physical activity and those with higher values of the Alzheimer's PRS are less likely to survive and participate in UK Biobank. The PRS for Alzheimer's disease in our analysis was associated with lower age at recruitment, suggesting that older people with higher score values are less likely to participate. Hence, considering these limitations, the variables that may be associated with selection or survival 41 should be interpreted with caution. In this PheWAS, we identified that a higher genetic liability for Alzheimer's disease is associated with 165 of the 15,402 UK Biobank variables. MR analysis follow-up showed evidence that only seven of these factors were implicated in the etiology of Alzheimer's disease. We found little evidence that the remaining phenotypes examined are likely to modify the disease process, but the association with the Alzheimer's disease PRS is likely due to reverse causation or selection bias. Further research should exploit the full array of potential relationships between the genetic variants implicated in Alzheimer's disease, intermediate phenotypes, and clinical phenotypes by using other omics and phenotypic data to identify possible biological pathways changing the risk of Alzheimer's disease.

Study design
Our analysis proceeded in two steps. First, we ran a PheWAS of the Alzheimer's disease PRS and all available variables in the UK Biobank, stratifying the sample by age. Second, we followed up all variables associated with the PRS in a bidirectional MR analysis. We outline the research questions answered by the PheWAS and the MR approach in Fig. 1.

Sample description
UK Biobank is a population-based study of 503,325 people recruited between 2006 and 2010 from across Great Britain 46,47 . This work was done under application number 16729 (version 2 genetic data [500 K with HRC imputation] and phenotype dataset 21753). In Supplementary Fig. 1, the flowchart shows the number of participants removed at each stage of the quality control pipeline. The UK Biobank study resource has ethical approval and its own ethics committee (https:// www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ ethics-advisory-committee). A full description of the study design, participants, and quality control (QC) methods has been published 48,49 . Briefly, participants were excluded due to familial relatedness and non-Caucasian ancestry. A total sample of 334,968 remained after QC ( Supplementary Fig. 1).

Polygenic risk score
We constructed a standardized weighted PRS including singlenucleotide polymorphisms (SNPs) associated with Alzheimer's disease at P ≤ 5 × 10 −8 for UK Biobank participants, based on the summary statistics from a meta-analysis of the International Genomics of Alzheimer's Project (IGAP) 50 , the Alzheimer's Disease Sequencing Project (ADSP) 51 and the Psychiatric Genetics Consortium (PGC) 52 (24,087 cases and 55,058 controls). SNPs were clumped using r 2 > 0.001 and a physical distance for clumping of 10,000 kb. A polygenic risk score including 18 genetic variants was calculated for each participant with genetic data using PLINK (version 1.9). Each score was calculated from the effect size (logarithm (log) odds)-the weighted sum of 18 alleles associated with Alzheimer's disease within each participant. Our primary analysis used the PRS, including variants near the APOE gene (Chr 19: 44,400-46,500 kb) 53 . The APOE region explains a large proportion of the variance in the polygenic risk score (R 2 = 84%). The PRS was standardized by subtracting the mean and dividing it by the standard deviation (SD) of the PRS.

Main analysis
The full UK Biobank sample was divided into three age-stratified subsamples (n = 111,656 in each tertile) with the aim of examining the agevarying effects of the PRS for Alzheimer's disease. We performed

Previously implicated risk factors
A levels/AS levels X X - College degree X X - Fig. 7 | Association of Alzheimer's disease polygenic risk score with the phenome, and estimated effect of each phenotype using Mendelian randomization. These findings showed evidence of association in the MR framework, following a correction for multiple testing using a strategy controlling for the false discovery rate. + andindicate the direction of the coefficient for phenotypes associated with Alzheimer's disease using two-sample MR. X represents associations that were consistent with the null.
PheWAS within each tertile. Age, sex, and the first ten genetic principal components were included as covariates.

Outcomes
The Biobank data showcase enables researchers to identify variables based on the field type (http://biobank.ctsu.ox.ac.uk/showcase/list. cgi). There were 2655 fields of the following types: integer, continuous, and categorical (single and multiple). We excluded 55 fields a priori, including age and sex, and technical variables (e.g., assessment center) (Supplementary Table 2).

Statistical analyses
Phenome-wide association study. We estimated the association of an Alzheimer's disease PRS with each phenotype in the three age strata using PHESANT (version 14). A description of PHESANT's automated rule-based method is published elsewhere 54 . We accounted for the multiple tests performed by generating adjusted P values, controlling for a 5% false discovery rate. The threshold (≤0.05) was used as a heuristic to identify variables for follow-up in the MR analysis and not as an indicator of significance 55,56 . Categories for the ordered categorical variables are in Table 3 of the Supplementary Methods. We also estimated the effects of genetic liability to AD on previously implicated risk factors for Alzheimer's disease, selecting four factors from the Global Burden of Disease Study (high BMI, high fasting plasma glucose, smoking, and a high intake of sugar-sweetened beverages) that contributed to metrics for deaths, prevalence, years of life lost, years of life lived with disability, and disability-adjusted life-years due to AD 3 . The review identified the following as potentially modifiable risk factors for dementia; less education, midlife hypertension, obesity and hearing loss, later life smoking, depression, physical inactivity, social isolation, and diabetes. Furthermore, a meta-analysis of case-control and population-based studies showed that rheumatoid arthritis is associated with a lower incidence of Alzheimer's disease 57 . The relationship between Alzheimer's disease and rheumatoid arthritis has been studied before using genetic-based methods such as MR 57 . Hence it is not examined here. We examined the use of methotrexate (an antiinflammatory drug for rheumatoid arthritis), due to observational studies 58,59 suggesting anti-inflammatory medicines for rheumatoid arthritis reduces the risk of Alzheimer's disease 58 . At the time of the analysis, plasma glucose was unavailable and not investigated.
Sensitivity analysis. We used the Nord-Trøndelag Health Study (HUNT) 60,61 to replicate the top PheWAS hits identified in the oldest age group (62-72 years) in UK Biobank, using the same ages for the stratification of the sample. The Trøndelag Health Study (HUNT) is a population-based study of~125,000 participants, which invited the entire adult (≥20 years) population of Trøndelag [61][62][63] . Adults were invited for questionnaires, interviews, clinical examinations, laboratory measurements, and storage of biological samples in at least one of four study rounds so far, including HUNT1 (1984)(1985)(1986) 62,63 . The current analysis includes genetic data from~90% (N = 71,860) of participants from HUNT2 and HUNT3 who were genotyped by genome-wide SNP arrays in 2015 60,64 . For the replication of Alzheimer's disease PRS-outcome associations in the HUNT study, we followed up 33 outcomes (i.e., those variables available in HUNT with sufficient sample numbers for replication) that were associated with the AD PRS in UK Biobank. In the HUNT study, there is a large amount of relatedness between participants 5 therefore, to avoid the need to exclude related participants and reduce the sample size, we used a method that accounts for the genetic relatedness using a restricted maximum likelihood (REML) approach 8 . We fit a linear mixed model where a genome-wide genetic relationship matrix (GRM) was used to account for the relatedness across the sample 60 . The models were adjusted for age, sex, and study participation round (if the outcome was measured in multiple rounds of HUNT study), batch, and ten principal components. Analyses were performed using R 4.0.3 (http://www.r-project.org) and GCTA software (version 1.93.3beta2) 8 . More details on the cohort and variable definition can be found in the Supplementary Methods. We repeated the PheWAS for the entire sample in UK Biobank without stratifying by age to maximize the power to detect associations. Furthermore, to examine if any detected associations could be attributed to the variants in or near the APOE gene, we repeated the PheWAS on the entire sample, excluding this region from the PRS.

Follow-up using Mendelian randomization
We investigated whether the variables identified in our PheWAS or previously reported risk factors 3 were a cause or consequence of Alzheimer's disease using bidirectional two-sample MR (details in the Supplementary Methods). MR is a method that uses alleles randomly allocated at conception as instrumental variables to estimate the causal effect of an exposure on an outcome 5 . MR is less prone to the bias of confounding and reverse causation associated with observational studies. However, for MR to produce unbiased causal effect estimates, each genetic variant that is used as an instrumental variable must fulfill three assumptions: (1) that it is associated with the exposure (relevance assumption), (2) that it is not associated with the outcome through a confounding pathway (exchangeability assumption), and (3) is only associated with the outcome through the exposure (exclusion restriction assumption). More details on terms related to MR can be found in the MR dictionary 65 .
In this MR analysis, we only considered risk factors identified by the PheWAS (in ages 62-72 years, which included the phenotypes identified in the younger age groups) and literature reviews. We identified SNPs that are strongly associated (P ≤ 5 × 10 −8 ) with each trait. SNPs in the APOE region 53 were removed from instruments proxying the exposures. For wheeze/whistling, we also examined the measured phenotype of forced vital capacity as a better measure of respiratory function. For spherical power, we derived four binary variables to indicate myopia (spherical power < −0.5) and hypertropia (spherical power > 0.5) in each eye. Exposure GWAS were based on summary statistics from UK Biobank and were performed with the BOLT-LMM software package 66 using a published pipeline 7 unless there was a larger published GWAS.
Alzheimer's disease GWAS. We used the same meta-analysis of the IGAP consortium 50 , ADSP 51 , and PGC 52 described above for the twosample MR analyses.
Estimating the effects of risk factors on Alzheimer's disease. We harmonized the exposure and outcome GWAS (details in Supplementary Methods). We estimated the effect of each exposure on Alzheimer's disease using MR and the inverse-variance weighted (IVW) estimator 67 . This estimator assumes that there is no directional horizontal pleiotropy (i.e., on average, the random effects on the outcome through pathways other than the exposure are not equal to zero) and that all the genetic variants are valid instrumental variables 5 . Furthermore, IVW uses weights that treat the genetic variant-exposure associations to be known rather than estimated (i.e., the No Measurement Error assumption). When genetic variants violate the NoME assumption, causal effect estimates may exhibit weak instrument bias, estimated with the F-statistic 68,69 . The F-statistic estimates the strength of association of the genetic variant with the exposure, indicating instrument strength (the larger the F-statistic, the stronger the instrument and the larger the statistical power) 68 . We present adjusted P values for inverse-variance weighted regression accounting for the number of results in the follow-up using the false discovery rate method.
Assessing pleiotropy. We investigated whether the SNPs had pleiotropic effects on the outcome other than through the exposure using MR Egger regression 70,71 . Egger regression allows for pleiotropic effects that are independent of the effect on the exposure of interest (InSIDE assumption) 70,72,73 . Egger regression is similar to IVW, except that it includes an intercept term representing the average pleiotropic effect. Similar to IVW, MR Egger also assumes no measurement error. To quantify the strength of the NOME violation for MR Egger, we report the I 2 G x statistic 74 , which indicates the expected relative bias of the MR Egger effect estimates.
Assessing causal direction. We used Steiger filtering to interrogate the direction of causation between Alzheimer's disease and the phenotypes identified in the PheWAS 10 . Steiger filtering assumes that a valid instrumental variable should explain more variance in the exposure (e.g., forced vital capacity) than the outcome (i.e., Alzheimer's disease), which should be true if the hypothesized direction from body mass index to Alzheimer's disease is true. We repeated MR analyses removing SNPs which explained more variance in the outcome than in the exposure.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The UK Biobank Study provided the data in this study (www. ukbiobank.ac.uk), received under the data request application no. 16729. The Alzheimer's disease GWAS included IGAP, ADSP, and PGC summary statistics. Summary statistics from IGAP are publicly available at http://web.pasteur-lille.fr/en/recherche/u744/igap/igap_ download.php. Summary statistics for ADSP can be obtained through a data access request https://dss.niagads.org/documentation/ data-application-and-submission/application-instructions/. Summary statistics from the PGC consortium are available at https://www.med. unc.edu/pgc/download-results/. The exposure GWAS in the follow-up MR studies were performed by Ben Elsworth and are publicly available at https://gwas.mrcieu.ac.uk. The GWAS on the blood-based biomarkers was performed by Roxanna Korologou-Linden, using the UK Biobank pipeline and can be provided upon request.